77 research outputs found
Did You Miss the Sign? A False Negative Alarm System for Traffic Sign Detectors
Object detection is an integral part of an autonomous vehicle for its
safety-critical and navigational purposes. Traffic signs as objects play a
vital role in guiding such systems. However, if the vehicle fails to locate any
critical sign, it might make a catastrophic failure. In this paper, we propose
an approach to identify traffic signs that have been mistakenly discarded by
the object detector. The proposed method raises an alarm when it discovers a
failure by the object detector to detect a traffic sign. This approach can be
useful to evaluate the performance of the detector during the deployment phase.
We trained a single shot multi-box object detector to detect traffic signs and
used its internal features to train a separate false negative detector (FND).
During deployment, FND decides whether the traffic sign detector (TSD) has
missed a sign or not. We are using precision and recall to measure the accuracy
of FND in two different datasets. For 80% recall, FND has achieved 89.9%
precision in Belgium Traffic Sign Detection dataset and 90.8% precision in
German Traffic Sign Recognition Benchmark dataset respectively. To the best of
our knowledge, our method is the first to tackle this critical aspect of false
negative detection in robotic vision. Such a fail-safe mechanism for object
detection can improve the engagement of robotic vision systems in our daily
life.Comment: Submitted to the 2019 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS 2019
Dropout Sampling for Robust Object Detection in Open-Set Conditions
Dropout Variational Inference, or Dropout Sampling, has been recently
proposed as an approximation technique for Bayesian Deep Learning and evaluated
for image classification and regression tasks. This paper investigates the
utility of Dropout Sampling for object detection for the first time. We
demonstrate how label uncertainty can be extracted from a state-of-the-art
object detection system via Dropout Sampling. We evaluate this approach on a
large synthetic dataset of 30,000 images, and a real-world dataset captured by
a mobile robot in a versatile campus environment. We show that this uncertainty
can be utilized to increase object detection performance under the open-set
conditions that are typically encountered in robotic vision. A Dropout Sampling
network is shown to achieve a 12.3% increase in recall (for the same precision
score as a standard network) and a 15.1% increase in precision (for the same
recall score as the standard network).Comment: to appear in IEEE International Conference on Robotics and Automation
2018 (ICRA 2018
Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection
There has been a recent emergence of sampling-based techniques for estimating
epistemic uncertainty in deep neural networks. While these methods can be
applied to classification or semantic segmentation tasks by simply averaging
samples, this is not the case for object detection, where detection sample
bounding boxes must be accurately associated and merged. A weak merging
strategy can significantly degrade the performance of the detector and yield an
unreliable uncertainty measure. This paper provides the first in-depth
investigation of the effect of different association and merging strategies. We
compare different combinations of three spatial and two semantic affinity
measures with four clustering methods for MC Dropout with a Single Shot
Multi-Box Detector. Our results show that the correct choice of
affinity-clustering combination can greatly improve the effectiveness of the
classification and spatial uncertainty estimation and the resulting object
detection performance. We base our evaluation on a new mix of datasets that
emulate near open-set conditions (semantically similar unknown classes),
distant open-set conditions (semantically dissimilar unknown classes) and the
common closed-set conditions (only known classes).Comment: to appear in IEEE International Conference on Robotics and Automation
2019 (ICRA 2019
Multi-Modal Trip Hazard Affordance Detection On Construction Sites
Trip hazards are a significant contributor to accidents on construction and
manufacturing sites, where over a third of Australian workplace injuries occur
[1]. Current safety inspections are labour intensive and limited by human
fallibility,making automation of trip hazard detection appealing from both a
safety and economic perspective. Trip hazards present an interesting challenge
to modern learning techniques because they are defined as much by affordance as
by object type; for example wires on a table are not a trip hazard, but can be
if lying on the ground. To address these challenges, we conduct a comprehensive
investigation into the performance characteristics of 11 different colour and
depth fusion approaches, including 4 fusion and one non fusion approach; using
colour and two types of depth images. Trained and tested on over 600 labelled
trip hazards over 4 floors and 2000m in an active construction
site,this approach was able to differentiate between identical objects in
different physical configurations (see Figure 1). Outperforming a colour-only
detector, our multi-modal trip detector fuses colour and depth information to
achieve a 4% absolute improvement in F1-score. These investigative results and
the extensive publicly available dataset moves us one step closer to assistive
or fully automated safety inspection systems on construction sites.Comment: 9 Pages, 12 Figures, 2 Tables, Accepted to Robotics and Automation
Letters (RA-L
Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal
Model-free reinforcement learning has recently been shown to be effective at
learning navigation policies from complex image input. However, these
algorithms tend to require large amounts of interaction with the environment,
which can be prohibitively costly to obtain on robots in the real world. We
present an approach for efficiently learning goal-directed navigation policies
on a mobile robot, from only a single coverage traversal of recorded data. The
navigation agent learns an effective policy over a diverse action space in a
large heterogeneous environment consisting of more than 2km of travel, through
buildings and outdoor regions that collectively exhibit large variations in
visual appearance, self-similarity, and connectivity. We compare pretrained
visual encoders that enable precomputation of visual embeddings to achieve a
throughput of tens of thousands of transitions per second at training time on a
commodity desktop computer, allowing agents to learn from millions of
trajectories of experience in a matter of hours. We propose multiple forms of
computationally efficient stochastic augmentation to enable the learned policy
to generalise beyond these precomputed embeddings, and demonstrate successful
deployment of the learned policy on the real robot without fine tuning, despite
environmental appearance differences at test time. The dataset and code
required to reproduce these results and apply the technique to other datasets
and robots is made publicly available at rl-navigation.github.io/deployable
Contrastive Language, Action, and State Pre-training for Robot Learning
In this paper, we introduce a method for unifying language, action, and state
information in a shared embedding space to facilitate a range of downstream
tasks in robot learning. Our method, Contrastive Language, Action, and State
Pre-training (CLASP), extends the CLIP formulation by incorporating
distributional learning, capturing the inherent complexities and one-to-many
relationships in behaviour-text alignment. By employing distributional outputs
for both text and behaviour encoders, our model effectively associates diverse
textual commands with a single behaviour and vice-versa. We demonstrate the
utility of our method for the following downstream tasks: zero-shot
text-behaviour retrieval, captioning unseen robot behaviours, and learning a
behaviour prior for language-conditioned reinforcement learning. Our
distributional encoders exhibit superior retrieval and captioning performance
on unseen datasets, and the ability to generate meaningful exploratory
behaviours from textual commands, capturing the intricate relationships between
language, action, and state. This work represents an initial step towards
developing a unified pre-trained model for robotics, with the potential to
generalise to a broad range of downstream tasks
Density-aware NeRF Ensembles: Quantifying Predictive Uncertainty in Neural Radiance Fields
We show that ensembling effectively quantifies model uncertainty in Neural
Radiance Fields (NeRFs) if a density-aware epistemic uncertainty term is
considered. The naive ensembles investigated in prior work simply average
rendered RGB images to quantify the model uncertainty caused by conflicting
explanations of the observed scene. In contrast, we additionally consider the
termination probabilities along individual rays to identify epistemic model
uncertainty due to a lack of knowledge about the parts of a scene unobserved
during training. We achieve new state-of-the-art performance across established
uncertainty quantification benchmarks for NeRFs, outperforming methods that
require complex changes to the NeRF architecture and training regime. We
furthermore demonstrate that NeRF uncertainty can be utilised for next-best
view selection and model refinement
- …